IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards#438
Merged
Conversation
Expr.Share resolution and universe lookups walked the sharing/univ lists with `list_lookup_u64` (per-step `u64_is_zero` + `relaxed_u64_pred`). Switch to: - universe lookups: `list_lookup(univs, flatten_u64(idx))` — cheap per-step field subtraction instead of the U64 predecessor. - Expr.Share (Convert convert_expr + Ingress deref_share): `let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));`. list_drop returns the sublist *pointer*; repeated Share lookups collapse to shared memo entries, so the walk is near-free. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10` (Init, owned=3, closure=676): total FFT 3.89G -> 2.11G. The merged pointer-list walk drops from 1.93G (list_lookup) to 5.8M (list_drop, 0.27%). (Measurement run had in-circuit blake3 verification disabled — not committed.)
The primitive-dispatch gauntlet in whnf compares each Const head against ~50 known primitive addresses, so address_eq is the single hottest circuit. The full 32-limb compare ran at width 109 on every call. Split it: `address_eq` now compares limb 0 only and rejects (the common case — a single differing limb proves inequality), delegating the remaining 31-limb compare to a separate `address_eq_tail`. Because Aiur charges a function's full width on every one of its rows, the cold path must be a separate function (a nested match in address_eq changes nothing — measured identical); as its own circuit, address_eq_tail's height is only the rare limb-0-match rows. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10`: - address_eq: width 109 -> 80, FFT 509.8M -> 374.1M - address_eq_tail (new): width 108, height 1683, FFT 1.9M - total shard FFT: 2.106G -> 1.973G Limb-0 is the sweet spot: hot height (259991) dominates tail height (1683), so pulling a second limb forward (N=2) widens the hot circuit by more than it saves in the tail (measured 1.976G, worse). Result is identical to a full 32-limb compare (sound). (Measurement run had in-circuit blake3 verification disabled — not committed.)
whnf_with_spine is the single hottest circuit on reduction-heavy consts: width 89 charged on every one of its 3.36M rows. The width was set by two arms that only a minority of rows reach — the Const-head dispatch (delta/iota/quot/prim gauntlet) and the Proj-head reduction. Factor both into their own functions (`whnf_const_head`, `whnf_proj_head`). Because Aiur charges a function's full width on every row, the ~76% of steps that are App/Lam/Let now run at the narrow residual width instead of 89; the wide dispatch only taxes its own (smaller) row count. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9` (1 reduction-heavy owned const): - whnf_with_spine: width 89 -> 34, FFT 6.48G -> 2.48G - whnf_const_head (new): width 77, height 821490, FFT 1.24G - whnf_proj_head (new): width 56, height 3452, FFT 0.002G - total shard FFT: 45.54G -> 42.78G (-6.1%) The Proj arm is rare (3452 rows) but was nearly half the width; extracting it dropped whnf_with_spine 65 -> 34 for ~zero relocated cost. Pure refactor.
Replace the tail-recursive `collect_spine_go(e, acc)` (and its verbatim copy `collect_spine_simple_go`) with one non-tail `collect_spine(e)` in the kernelTypes block. Repoint all callers (whnf, primitive, inductive_check, def_eq); delete both old copies. Keyed on `e` alone, it now dedups shared sub-spines (0 -> 2.19M cache hits). Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - collect_spine: 5.38M rows / 2.65G -> 2.17M rows / 0.96G - list_snoc.Ptr: +0.96G (new) - total shard FFT: 42.78G -> 41.67G
check_all_skipping tested each closure const against the assumption-leaf list via addr_in_list, an O(N) linear address_eq scan — the single largest address_eq source on sharded checks (1.81M calls on Init shard 9, more than all primitive dispatch combined). Build an RBTreeMap skip-set once (keyed on the first 4 address bytes), then test membership with one rbtree lookup + one confirming address_eq. Key collisions overwrite: sound because a missed skip only rechecks a frontier const, and the confirming address_eq rules out false positives. Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`: - address_eq: 2.07M rows / 3.48G -> 274K rows / 0.40G - rbtree build + lookups: +~0.46G - total shard FFT: 41.67G -> 37.84G
try_nat_linear_rec only folded `Nat.rec base (λ_ ih => succ ih) (Lit n)` when the base was also a literal; a symbolic base declined and fell through to n iota steps that materialize succ^n(base). Reduce the symbolic-base case directly to the offset primitive `Nat.add base (Lit n)` instead. This keeps the value in the same compact `base + n` form a literal already has, so def-eq comparing it converges instead of descending n unary succ layers (the UTF-8 `x + 0xC0` reduction class). Offset magnitude stays KLimbs; only the nat_add top-position index is a (small) G. Measured on a `x + D = Nat.rec x succ-step D` benchmark: the unary def-eq descent collapses (nat_succ_of D -> ~constant), ~-15% total FFT at D=512. The remaining cost is the in-whnf chain reduction (substitution), unchanged here.
Reducing `Nat.add x (Lit n)` (symbolic x) delta-unfolded into a succ^n(x) tower, and the nat_succ dispatch then whnf-cascaded the whole chain — O(n) substitution per reduction, the dominant cost on Nat-arithmetic proofs (the UTF-8 codec class). Two coordinated changes: - whnf (try_nat_offset_stuck): leave `Nat.add base (Lit n)` (non-literal base, nonzero n) stuck as a compact offset instead of unfolding it. - def-eq (nat_offset_of + try_def_eq_nat): decompose each side to base + KLimbs offset (reading `Nat.add base (Lit m)` in O(1), peeling the few succs whnf exposes) and decide `base_a ≟ base_b` only when offsets are equal; otherwise fall back. All magnitudes stay KLimbs (no Goldilocks collapse). Measured on `x + D = Nat.rec x succ-step D` (depths 64..512): total FFT goes from O(D) (328M at D=512) to a depth-independent ~28.5M (-91% at D=512). Full `lake test -- --ignored ixvm` passes (297 tests; BAD cases still rejected); a recursor-on-symbolic-offset completeness test reduces correctly.
`Nat.div x (Lit k)` / `Nat.mod x (Lit k)` with a symbolic base delta-unfolded the division algorithm and expanded hugely, even though the result is irreducible for a symbolic base. `Nat.shiftRight x k` unfolds to k nested `Nat.div _ 2`, so a symbolic `>>>` materialized a deep, super-linear tower — the dominant cost in the UTF-8 codec's `c >>> 6/12/18` encoding. Generalize `try_nat_offset_stuck` (which already keeps `Nat.add base (Lit n)` compact) to also leave `Nat.div`/`Nat.mod base (Lit k)` (k ≥ 2) stuck. Thresholds keep `x/1 = x` and `x/0 = 0` reducing. Both `x >>> 18` and `(x >>> 9) >>> 9` then reduce to the same nested-stuck-div form, so def-eq stays structural and the identity still holds. Measured on `x >>> 18 = (x >>> 9) >>> 9`: total FFT 15.65G -> 455M (-97%), now depth-independent (matches `x >>> 6`). No regression on the Nat.add reproducers; full `lake test -- --ignored ixvm` passes (297 tests).
Ingress deserialization (parsing every closure const's Ixon bytes into Expr trees) is the dominant cost when checking large-closure constants. Two width cuts on the hottest parsers: - get_expr: the Let arm (three recursive get_expr calls) set the function width though Let is rare. Factored into get_expr_let — width 102 -> 75. - get_app_telescope: passed `func` by value (the wide Expr union) on every recursion step. Pass it by `&Expr`, loaded only at the base case — width 95 -> 78. Measured by profiling ingress of the `ByteArray.utf8DecodeChar?…` closure (2000 consts): get_expr 766M -> 563M, get_app_telescope 477M -> 392M, total ingress FFT 3.13G -> 2.85G. Full `lake test -- --ignored ixvm` passes (297).
Key whnf / k_infer / k_is_def_eq memoization on the loose-bvar-reachable suffix of the local context (ctx_trim / ctx_reachable), mirroring Rust's whnf_key / infer_key / def_eq_ctx_key. A closed subterm (lbr 0) keys to the empty context and shares its result across every binder depth; eager substitution is unchanged. Fast paths (closed -> Nil, full-reach -> as-is) keep the boundary cheap. FFT (lake exe ix check <const> --stats-out): Nat.add_comm 58.28M -> 57.90M (-0.66%); Int.emod_emod_of_dvd 5.090G -> 4.691G (-7.85%); Init shard 10 4.738G -> 4.721G (-0.37%). 297 ixvm tests pass.
parseName can't rebuild a private name (_private.M.0.foo) from its display string -- the marker / scope-index components don't round-trip through naive dot-splitting. Add toString fallbacks: resolveName for the compiled-in env and resolveIxeAddr for a .ixe env, so check and prove (which share forEachClaim) resolve names like _private.Init.Data.Vector.Extract.0.Vector.extract_append._proof_1 across both the compiled-in and --ixe paths.
whnf_apply_beta substituted one spine arg per expr_inst1, re-walking the (shrinking) body for each, so a K-arg application `(λλλ.body) x y z` walked the body K times. Mirror Rust whnf.rs:541-567: peel the whole telescope (peel_beta) and substitute all consumed args in ONE expr_inst_many walk (simul_subst semantics). Single-arg betas stay on the cheap expr_inst1 path, so short telescopes don't pay the list overhead (why the earlier unconditional simul_subst regressed). Hot/cold split expr_inst_many_walk's BVar arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar so the hot App/Lam walk stays narrow. Substitution was 53% of FFT on omega-heavy proofs. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 56.64G -> 38.31G (-32.4%); Nat.add_comm 57.90M -> 56.17M; Init shard 10 4.721G -> 4.668G (-1.1%); shard 9 16.346G -> 16.314G. 297 ixvm tests pass.
whnf_const_head ran 5 try_* dispatchers (nat / str / bitvec / native / decidable) in sequence on every Const head, each comparing the head address against its family's primitive addresses — wasted on the common non-primitive head, and worse, several try_* speculatively WHNF their args (to test for literals / decidables), cascading into deep reductions that are then thrown away. Factor the chain into try_address_primitives and gate it behind prim_any_addr: a memoized (per head address) sum of the 43 head-dispatched primitive address_eq checks — 1 only for a real primitive, so a 0 skips the whole gauntlet straight to projection-definition / delta. Validated by a differential assert (prim_any_addr must be 1 whenever the gauntlet matched) run over the 297-test suite + Int.emod_emod_of_dvd + Vector.extract_append._proof_1; it caught two missing addresses (string_utf8_byte_size, punit_size_of_1), then passed clean across millions of reductions. Assert removed after validation. FFT (ix check <const>): _private.…Vector.extract_append._proof_1 38.31G -> 28.02G (-26.8%); Int.emod_emod_of_dvd 4.69G -> 3.97G (-15%); Nat.add_comm 56.17M -> 56.06M; Init shard 9 16.314G -> 16.279G; shard 10 4.668G -> 4.659G. 297 ixvm tests pass.
Member
|
Are any of these improvements able to be ported to the Rust kernel implementation? Also, can we run execution out-of-circuit on Init, Lean & Mathlib? |
johnchandlerburnham
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Thirteen commits in the Aiur IxVM kernel: ten optimization areas — each
targeting a specific hotspot found by profiling per-function FFT cost on real
shards or single constants of a compiled
Initenvironment — plus a CLIergonomics fix for resolving private
Lean.Namearguments. No semantics change:every shard still checks without error, and the full
lake test -- --ignored ixvmsuite passes (297 tests) at every step.Cumulative effect on a reduction-heavy shard (Init shard 9, one owned constant,
deep reduction): 45.5G → 16.6G total FFT (−64%). The structural / keyed-lookup
areas (1–5) account for 45.5G → 37.8G; the Nat-reduction area (6) takes it the
rest of the way by collapsing two symbolic-arithmetic blowups from O(n) / O(n²)
to O(1). That same area makes a previously intractable proof class — the UTF-8
codec lemma
ByteArray.utf8DecodeChar?_utf8EncodeChar_append, whose checkingspawns deep
Nat.add x 0xC0succ-towers andc >>> 6/12/18division-towers —reduce in constant rather than tower-depth cost. The deserialization (7),
memoization (8), substitution (9), and primitive-gauntlet (10) areas then cut
ingress and reduction cost across all consts.
Those last areas make another previously intractable proof checkable:
Vector.extract_append._proof_1(anomega-backedVector.extract/++lemmawhose check is ~50% beta-substitution) went from OOM (>14G RAM) to 56.6G FFT
(area 8, which lets it complete at all) to 38.3G (area 9, −32%) to 28.0G
(area 10, a further −27%) — −50% from the first completing run.
Method
Built a
.ixeof theInitclosure, profiled and sharded it(
ix profile→ix shard --shards 64), then ranix checkper shard or perconstant and read the per-function FFT table (
--stats-out). Shard 10(check-heavy: lookups, address compares, deserialize) and shard 9
(reduction-heavy: whnf, substitution, spine collection) exercise different
profiles; small hand-built or hand-picked reproducers (
x + D = …,x >>> D = …,Int.emod_emod_of_dvd) isolate and scale individual blowups.Aiur cost model behind the changes: FFT ≈
per-function width × row height,charged at the function's max-arm width on every row. Levers: (a) factor a
wide/rare arm into its own circuit; (b) cut row count via better memoization;
(c) replace linear scans with keyed lookups; (d) keep symbolic primitives
compact so they never materialize a tower.
Areas
1. Cheaper
Expr.Share/ universe lookups — commit3774860.The sharing-table and universe-array lookups walked the list with
list_lookup_u64(per-step U64 predecessor). Switched to a field index and, forExpr.Share, tolist_drop— which returns the sublist pointer, so repeatedlookups dedup in the memo cache. Shard 10: total FFT 3.89G → 2.11G; the merged
pointer-list walk fell from 1.93G to 5.8M.
2. Hot/cold split of
address_eq— commit69b6936.The primitive-dispatch gauntlet compares each
Consthead against ~50 hardcodedaddresses, making
address_eq(a full 32-limb compare, width 109) the hottestcircuit. Almost every comparison differs at limb 0. Split it:
address_eqcompares limb 0 only and delegates the rest to
address_eq_tail, whose height isjust the rare limb-0 matches. (A nested match in the same function does nothing —
Aiur charges full width per row regardless of arm; the split must be a function
boundary.) Shard 10: width 109 → 80, 509.8M → 374.1M; total 2.106G → 1.973G.
3. Hot/cold split of
whnf_with_spine— commit0e30d78.On reduction-heavy consts this is the single hottest circuit (width 89 × 3.36M
rows), its width set by two arms a minority of rows reach: Const-head and
Proj-head dispatch. Factored both into
whnf_const_head/whnf_proj_head. TheProj arm is the jackpot — 3.4K rows but nearly half the width. Shard 9: width
89 → 34, 6.48G → 2.48G.
4. Non-tail, single-definition
collect_spine— commit5b7fabc.collect_spinepeeled the App chain with a tail-recursive accumulator. Tailrecursion buys nothing in Aiur (stack depth is free), and the accumulator landed
in the memo key, blocking reuse — 0 cache hits across 5.38M rows. Rewrote it
non-tail, keyed on the expression alone (2.19M hits now), and merged the two
verbatim copies into one definition in
kernelTypes. Shard 9: 2.65G → 0.96G.5. Keyed skip-set for sharded check — commit
a52967f.The largest
address_eqsource on a sharded check wasaddr_in_list, an O(N)linear scan of the assumption-leaf list per checked const (1.81M calls on shard
9). Replaced with an
RBTreeMapskip-set keyed on the first 4 address bytes:build once, membership = one rbtree lookup + one confirming
address_eq. Shard 9:address_eq2.07M rows / 3.48G → 274K rows / 0.40G; total 41.7G → 37.8G.6. Compact symbolic Nat reduction — commits
dbc4177,5dcab7f,f7cfe23.Natprimitives applied to (symbolic base, literal) delta-unfolded into towersthe kernel then re-walked:
Nat.add x 0xC0→succ^192(x), andNat.div x 2(what
x >>> kunfolds to, k times) → an exploding division algorithm. Threecoordinated commits keep them compact:
dbc4177:try_nat_linear_recreducesNat.rec base (succ-step) (Lit n)witha symbolic base directly to
Nat.add base (Lit n)instead of n iota steps.5dcab7f: whnf (try_nat_offset_stuck) leavesNat.add base (Lit n)stuck asa compact offset; def-eq (
nat_offset_of+try_def_eq_nat) decomposes eachside to
base + KLimbsand decidesbase ≟ baseon equal offsets.f7cfe23: generalizes the stuck rule toNat.div/Nat.mod base (Lit k)(k ≥ 2), so symbolic
>>>reduces to nested-stuck-div in O(1).Benchmarks:
x + D = Nat.rec x succ-step Dgoes O(D) → depth-independent(328M → 28.5M at D=512, −91%);
x >>> 18 = (x >>> 9) >>> 9goes 15.65G → 455M(−97%), matching
x >>> 6. All magnitudes stayKLimbs(no Goldilockscollapse); only de Bruijn /
top-position indices areG.7. Narrow Ixon expr deserialization — commit
fa3447e.On large-closure consts, ingress deserialization dominates: the two hottest
parsers are
get_exprandget_app_telescope, each carrying width set by asingle arm. Split
get_expr'sLetarm (three sub-get_exprcalls + triplestore) intoget_expr_let, droppingget_exprwidth 102 → 75. Changedget_app_telescope'sfunc: Exprparameter tofunc: &Expr(load inside),dropping width 95 → 78. Ingress profile (2000-const UTF-8 closure, run_check
load-only):
get_expr766M → 563M,get_app_telescope477M → 392M, totalingress 3.13G → 2.85G.
8. lbr-trimmed context keys for whnf / infer / def-eq memoization — commit
d96c738.Aiur auto-memoizes
whnf/k_infer/k_is_def_eqon the full(expr, types, …)argument tuple, so the same closed reduction gets a freshmemo key at every binder depth and misses. Mirror Rust's
whnf_key/infer_key/def_eq_ctx_key(src/ix/kernel/tc.rs): trimtypesto the suffix reachable from the expression's loose-bvar range (
ctx_trim/ctx_reachable, a fixpoint that closes over kept frames' own ranges) before thememoized call. A closed subterm (
lbr == 0) keys to the empty context and sharesits result across every depth. Eager substitution is unchanged (matches the
Lean / Rust reference); the win is purely cache keying. Fast paths
(closed →
Nil, full-reach → as-is) keep the boundary cheap.FFT (
ix check <const> --stats-out):Nat.add_comm58.28M → 57.90M (−0.66%);Int.emod_emod_of_dvd5.090G → 4.691G (−7.85%, inference / omega-heavy); Initshard 10 4.738G → 4.721G (−0.37%). Net-positive everywhere measured.
9. Multi-arg beta with a narrow substitution walk — commit
3203691.whnf_apply_betasubstituted one spine argument perexpr_inst1, re-walkingthe (shrinking) body for each, so a K-argument application
(λλλ. body) x y zwalked the body K times. Substitution (
expr_inst1+expr_inst1_walk) was~53% of FFT on
omega-heavy proofs. Mirror Rust's multi-arg beta(
src/ix/kernel/whnf.rs:541-567):peel_betaconsumes the whole telescope andexpr_inst_many(=simul_subst) substitutes all consumed args in one walk.Single-arg betas stay on the cheap
expr_inst1path, so short telescopes don'tpay the list overhead — the reason an earlier unconditional
simul_substregressed and was reverted. A hot/cold split factors the multi-arg walk's
BVararm (
list_length/list_lookup/expr_lift) intoexpr_inst_many_bvar,keeping the hot App/Lam walk narrow (width 82 → ~53, the same pattern as areas
2–3). FFT (
ix check <const> --stats-out):…Vector.extract_append._proof_156.64G → 38.31G (−32.4%);Nat.add_comm57.90M → 56.17M; Init shard 10 4.721G → 4.668G (−1.1%); shard 9 16.346G →
16.314G.
10. Gate the primitive-op gauntlet behind a memoized classifier — commit
87cb8b0.whnf_const_headran fivetry_*dispatchers (nat / str / bitvec / native /decidable) in sequence on every
Consthead, each comparing the headaddress against its family's primitive addresses — wasted on the common
non-primitive head. Worse, several
try_*speculatively WHNF their arguments(to test for literal / decidable operands), so a non-primitive head still
triggered deep argument reductions that were then thrown away. Factor the chain
into
try_address_primitivesand gate it behindprim_any_addr: a sum of the 43head-dispatched
address_eqchecks, memoized per head address (computed onceper const, not per whnf). A
0means notry_*can match, so dispatch jumpsstraight to projection-definition / delta. Validated by a differential assert
(
prim_any_addrmust be 1 whenever the gauntlet actually matched) run over thesuite +
Int.emod_emod_of_dvd+Vector._proof_1; it caught two missingaddresses (
string_utf8_byte_size,punit_size_of_1), then passed acrossmillions of reductions — assert removed afterward. FFT (
ix check <const>):…Vector.extract_append._proof_138.31G → 28.02G (−26.8%, the skippedspeculative arg-WHNF);
Int.emod_emod_of_dvd4.69G → 3.97G (−15%);Nat.add_comm56.17M → 56.06M; Init shard 9 16.314G → 16.279G; shard 104.668G → 4.659G.
Tooling
Resolve private
Lean.Nameargs in check/prove — commit6ec124a.parseName(naive dot-split; an all-digit segment →Name.mkNum) can't rebuilda private name like
_private.Init.Data.Vector.Extract.0.…_proof_1from itsdisplay string, so
ix check/ix provereported it "not found". AddedtoStringfallbacks —resolveName(compiled-in env) andresolveIxeAddr(
.ixeenv) — that match the arg against each constant's displayed name when thedirect lookup misses. Covers both commands (shared
forEachClaim), across thecompiled-in and
--ixepaths. Tooling only; no kernel change.Soundness
(each shard still checks).
address_eqreturns the same value as a full 32-limb compare — onediffering limb proves inequality.
skip (a frontier const gets rechecked, never wrongly trusted); the confirming
address_eqrules out false positives; the 4-byte key makes collisionsnegligible regardless.
Nat.add / div / mod symbolic literalstuck is the correctnormal form (these are irreducible for a symbolic base).
try_def_eq_natreturns
eq = 1only when offsets are equal and bases are def-eq (+ kisinjective); otherwise it falls back to the generic path.
BADnegative casesstill get rejected (no false accept), and recursor-on-symbolic-offset reduces
correctly.
typesto the loose-bvar-reachable suffix changes only thememo key, not the computation — a closed expr never consults the trimmed
frames, and
ctx_reachable's fixpoint keeps every frame a kept frame's storedtype can reach. Eager substitution and all reduction logic are untouched.
expr_inst_manyis the standard simultaneous de Bruijn substitution(
simul_subst);peel_betaconsumes exactly one argument perLam, so(λλλ. body) x y zand three sequential single substitutions produce the sameterm. The hot/cold split only relocates the
BVararm — identical result.Eager substitution semantics are unchanged (matches the Lean / Rust reference).
prim_any_addris1whenever the gauntlet wouldreduce — skipping it then only elides work that returns "no match". A
differential
assertchecked exactly this invariant on the suite plus the twoheavy consts (millions of reductions across every primitive family) before the
gate was trusted; the assert caught the only two omissions and then ran clean.
ixvmsuite passes (297 tests) at every commit.